In the realm of dealing with risk when choosing whether to invest or to partner with another firm, having the ability to accurately determine whether the firm will go bankrupt or not is invaluable. Doing so however is difficult for a human to do alone, and machine learning can help make this prediction task feasible. Three separate models are tested and compared: Logistic Elastic-net, Boosted Forest, and SVM. This project focuses on 3 main tasks:
Preparing the data
Correcting Class Imbalance
Feature Engineering and Normalization
Model Tuning
Grid Specification
Cross-Validation
Model Selection and Defining Success
Performance Measure Selection
Reasoning for Selected Model
The data for this project comes from kaggle, and is composed of 6819 companies over the period of 1999 to 2009 in Taiwan. The dependent variable ‘bankrupt’ is a binary variable, with 1 meaning the company went bankrupt and 0 meaning the company did not go bankrupt. There are 95 independent variables ranging from ‘return on assets’ to ‘cash flow rate’.